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CHAPTER  1 
INTRODUCTION 

1.1.  History  and  Motivation 

Most  control  systems  problems  can  be  characterized  by  the  following 
two  basic  questions:  i)  Does  the  problem  consist  of  a  single  or  multiple 
decision  makers?  and  ii)  Does  the  system  under  consideration  have  known  or 
unknown  parameters?  Although  control  theory  for  single  decision  maker 
problems  has  been  well  developed  [1-5],  we  note  that  many  complicated 
single  decision  maker  problems  can  be  reformulated  into  simpler  multiple 
decision  maker  problems.  This  is  exemplified  by  the  concept  of  multimodeling 
strategies  for  large  scale  systems  [6].  This  concept  allows  a  large  scale 
system  to  be  controlled  by  multiple  decision  makers  using  various  simplified 
models  of  the  system.  Each  decision  maker  will  attempt  to  individually 
optimize  his  own  simplified  system,  but  due  to  modeling  uncertainties,  there 
is  no  assurance  that  optimization  by  each  individual  decision  maker  will 
lead  to  an  optimization  of  the  entire  system.  Therefore,  the  problem  at  hand 
falls  nicely  into  the  general  framework  of  stochastic  game  theory. 

Past  research  in  game  theory  has  concentrated  mainly  on  problems 
involving  systems  with  known  parameters  [7-9].  However,  game  theory  involving 
problems  with  some  uncertainties  in  the  system  parameters  appears  to  have 
widespread  applications  in  power  systems,  industrial  systems  (as  described 
above),  and  in  various  economic  and  military  fields  which  warrant  its 
consideration.  Consider,  for  example,  a  situation  where  several  independent 
firms  are  selling  similar  products  in  the  same  consumer  market.  Each  firm 
is  attempting  to  maximize  its  profit  function,  which  is  related  through  the 
market  structure  to  its  own  production  level  as  well  as  the  production 
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level  of  its  competitors.  Realistically,  each  firm  should  have  complete 
knowledge  of  its  own  profit  function,  but  not  necessarily  those  of  its 
competitors.  How  should  each  firm  proceed  to  operate?  Motivated  by  this 
problem  and  related  examples,  we  propose  to  study  in  this  thesis  methods 
for  solving  game  problems  with  some  uncertainties  in  the  system  parameters. 


1.2.  General  Problem  Description 

There  are  two  basic  types  of  game  problems:  Nash  games  and 
Stackelberg  or  leader-follower  games.  We  will  concentrate  on  a  basic  two- 


player  Stackelberg  game,  which  is  characterized  by  the  fact  that  one  of  the 


players,  known  as  the  leader,  has  access  to  more  information  than  the  other 
player,  known  as  the  follower.  At  every  stage  of  the  game  each  player  is 


attempting  to  minimize  his  own  cost  functional.  For  additional  simplifica¬ 


tion  we  focus  on  the  static  case  (no  plant)  so  that  the  two  cost  functionals 
depend  solely  upon  the  inputs  of  the  two  players.  These  two  restrictions 
should  not  cause  any  loss  of  generality  because  we  believe  that  in  the 
future  our  results  can  be  extended  to  both  dynamical  game  problems  and  game 
problems  which  involve  additional  players  and  levels  of  hierarchy. 

In  our  problem  we  assume  that  each  player  has  complete  knowledge  of 
his  own  cost  functional.  In  addition,  we  assume  the  leader  knows  the  structure 
of  the  follower's  cost  functional,  leaving  various  weighting  parameters  as 
unknown.  For  a  general  Stackelberg  game  we  usually  have  interdependent  cost 
functionals,  and  thus  the  leader's  cost  will  usually  depend  upon  the 
follower's  input.  Consequently,  the  leader  will  attempt  to  use  his  superior 
position  to  try  and  influence  the  follower  to  react  in  a  manner  which  helps 
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minimize  the  leader's  cost.  The  leader  can  use  his  knowledge  of  the 
follower's  cost  functional  structure  along  with  some  estimates  of  the 
unknown  parameters  to  try  and  predict  the  follower's  possible  reaction. 

This  places  the  leader  in  a  better  position  to  minimize  his  cost  because 
he  may  now  incorporate  the  follower's  possible  reaction  into  the  optimiza¬ 
tion  procedure  used  to  calculate  his  own  input. 

One  method  of  implementing  this  procedure  is  through  an  incentive 
control  structure.  In  this  method,  the  leader  applies  an  input  which  is 
functionally  dependent  upon  the  difference  between  the  follower's  actual 
input  and  the  follower's  input  desired  by  the  leader.  Incentive  control 
structures  have  already  been  applied  to  Stackelberg  games  with  known  system 
parameters  [9,10].  The  methods  described  in  these  papers  do  not  provide  a 
unique  leader's  input  structure,  but  the  question  of  selecting  a  particula~ 
input  structure  based  upon  a  minimum  sensitivity  approach  has  been  examined 
[11].  In  this  thesis  we  seek  to  extend  the  use  of  incentive  control  structures 
to  Stackelberg  games  with  unkuown  cost  functionals. 

Our  approach  to  this  problem  is  based  upon  the  concept  of  certainty 
equivalence  [12]  and  the  general  theory  of  self-tuning  regulators  [13-16]. 

The  algorithm  we  have  devised  to  solve  this  problem  is  basically  an  adaptive 
scheme  which  uses  the  output  of  a  parameter  estimator  to  self-tune  the 
leader’s  incentive  control  structure.  A  general  block  diagram  of  cur 


This  thesis  is  organized  into  five  chapters.  Chapter  1  is  an 
introduction  and  a  general  description  of  the  problem.  In  Chapter  2  we 
examine  a  first  order  Stackelberg  game  and  develop  two  methods  for  itera¬ 
tively  adjusting  an  incentive  control  structure.  Both  of  these  methods 
generate  controls  which  converge  to  the  optimal  incentive  control.  Chapter 
3  concentrates  on  developing  a  corresponding  algorithm  for  a  similar  problem 
of  higher  dimensions.  It  also  describes  some  additional  considerations 
which  are  important  in  developing  a  higher  order  incentive  control  structure. 
Chapter  4  contains  some  numerical  simulation  examples  of  both  the  first  and 
second  order  algorithms  and  comments  on  their  performance.  The  thesis  is 
summarized  in  Chapter  5,  which  also  contains  some  thoughts  on  possible  areas 
for  further  research. 


CHAPTER  2 


SCALAR  STACKELBERG  GAMES  WITH  UNKNOWN  COST  FUNCTIONALS 

2.1.  Introduction 

In  this  chapter  we  study  the  problem  of  finding  an  optimal  incen¬ 
tive  control  for  a  scalar,  static  Stackelberg  game  with  unknown  cost 
functionals.  We  begin  by  constructing  a  basic  incentive  input  control 
structure  for  the  leader.  Then  we  derive  an  expression  for  the  optimal 
incentive  constant  based  upon  estimates  of  the  unknown  system  parameters. 

It  is  not  necessary  to  estimate  the  actual  values  of  all  of  the  unknown 
system  parameters  since  we  seek  only  the  information  required  to  obtain  the 
actual  optimal  incentive  constant.  The  majority  of  the  chapter  is  devoted 
to  describing  two  separate  parameter  estimation  schemes,  each  of  which  can 
be  used  to  iteratively  produce  controls  which  converge  to  the  optimal  incen¬ 
tive  control. 


where  R.  S  ,  R  ,  and  S  are  positive  constants.  Clearly  J  and  J  are 
L  L  r  F  L  F 

always  >  0  and  from  the  leader's  viewpoint  (ut,vt)  is  the  optimal  control 
pair. 

The  information  structure  of  this  problem  is  such  that  each 
player  knows  his  own  cost  functional,  but  the  leader  also  knows  the  structure 
of  the  follower's  cost  functional.  The  leader  also  has  the  privilege  of 
requiring  the  follower  to  play  his  input  v  first.  The  leader  then  plays 
his  input  u  and, subsequently,  both  players  may  compute  their  costs  for  that 
particular  stage.  This  process  is  then  repeated  until  an  equilibrium  con¬ 
dition  is  reached. 

The  leader  cannot  be  assured  of  cooperation  from  the  follower  so 
he  will  attempt  to  use  his  informational  advantage  to  try  and  force  the 
follower  to  cooperate.  The  leader  may  obtain  an  estimate  (model)  of  the 
follower's  cost  functional  by  combining  his  knowledge  of  the  structure  of 
J  along  with  estimates  for  the  unknown  weighting  parameters.  The  leader 

r 

may  now  use  this  estimate  of  J  to  simulate  the  follower's  optimization  at 
each  stage  and  thus  predict  the  follower's  rational  reaction  pattern.  This 
allows  the  leader  to  select  his  input  u  at  each  stage  to  optimize  his  cost 
functional  J  with  respect  to  both  his  input  u  and  the  follower's  possible 

Li 

reactionary  input  v. 

There  are  many  possible  methods  of  implementing  this  additional 
information  to  try  and  enforce  cooperation.  We  concentrate  on  an  incentive 
control  input  structure  for  the  leader.  With  this  structure  the  leader's 
input  u  at  each  stage  is  functionally  dependent  upon  the  difference  between 
the  follower's  actual  input  and  the  follower's  optimal  input  v1"  desired  by 
the  leader.  The  mechanics  of  the  game  now  proceed  as  follows  [17]: 
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Step  1.  At  the  beginning  of  each  stage,  the  leader  formulates  his  incentive 
control  input  structure  u(v),  based  on  his  parameter  estimates,  and 
presents  it  to  the  follower. 

Step  2.  The  follower  calculates  his  input  v  by  optimizing  his  cost  func¬ 
tional  J  ,  knowing  that  the  leader's  input  will  be  given  by  the 
F 

function  u(v) . 

Step  3.  The  follower  plays  his  input  v  and  the  leader's  resulting  input  u 
is  calculated  from  the  input  structure  u(v).  The  players  then 
compute  their  costs  for  that  stage. 

At  the  end  of  each  stage  the  leader  uses  the  additional  information  acquired 
during  that  stage  to  update  his  parameter  estimates  and  then  returns  to 
step  1.  The  game  continues  in  this  manner  until  an  equilibrium  point  is 
reached.  For  our  purposes,  we  will  define  an  equilibrium  point  as  an  input 
pair  (ue>ve)  such  that  each  player  will  play  the  same  input  at  the  next 
stage.  If  the  leader  applies  the  optimal  incentive  control  input,  then 
obviously  the  optimal  input  pair  (ut,vt)  will  be  the  resulting  equilibrium 
point . 


2.3.  Construction  of  an  Optimal  Incentive  Control 

The  leader  knows  only  the  structure  of  J  and, therefore, he  must 

F 


;(i) 


atterr.;  ~  to  estimate  the  parameters  and  Sp.  Let  us  denote  by  Rp 

the  leader's  estimates  of  R_  and  S_  at  stage  i.  From  (2.1b)  the 
F  F  F 

leader's  estimate  of  the  follower's  cost  functional  at  stage  i  becomes 


and 


i<»  -  u2R<»  +  v2S<» 


(2.2) 


ns1* ’ii  w-a  > 


i  Fjn*  .’Tirrji  "'jl.'-VJ  np-n 


*  Tm  WWTT^.  *CT  VI  *»TJ 


Let  us  collect  the  unknown  system  parameters  R  and  S_  into  a  vector  denoted 

F  F 


~(i)  -fi)  -(i*) 

by  a.  Let  the  vector  a  consist  of  the  estimates  R'  and  S' 

F  F 


At  each  stage  i  consider  a  leader’s  input  of  the  form 


u(l)  *  uC  +  D(a^)  (v^1^-vt) 


(2.3) 


(i) 


With  this  structure  the  leader’s  true  input  u  at  stage  i  deviates  from 


his  desired  optimal  input  uC  by  an  incentive  constant  D(a^)  multiplied 


(i) 


by  the  difference  between  the  follower’s  actual  input  vv  1  at  stage  i  and 


vC.  Note  that  when  v^  =  vC  we  have  u^  =  u*"  and  J  is  minimized.  To 

L 


achieve  this  minimum  the  leader  must  select  the  appropriate  incentive 


constant  D(a^). 


(i) 


Given  a  particular  structure  for  u  as  in  (2.3),  the  follower 


.<i) 


will  select  an  input  v  which  minimizes  J  as  given  in  (2.1b).  The  leader 

F 


A  / 

may  simulate  this  minimization  by  using  his  estimate  J'  given  in  (2.2)  and 

F 


computing  the  solution  to 


dj'1'  ,u(i) 

•  /  .  N  ✓  *  \ 


dv(i>  3;(i>  3u(i)  sv(i) 


.(i) 


where  vv  represents  the  leader's  estimate  of  the  follower's  input  v 
at  stage  i.  Solving  this  expression  we  obtain 
(i) 


(2.4) 

(i) 


dj 


dv 


jjj  =  2v(i)S^i)  +  2R<i)D(*(i))[ut+D(a(i))(v(i)-vt)]  =  0  (2.5) 


which  reduces  to 


-(i)  =  D(a(1))fD(a(1))vt-ut1 


(2.6) 


■unvniv 


a 
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The  leader  wishes  to  select  D(a^)  so  that  v^  =  vC  at  each  stage.  Setting 


v^=vt  in  (2.6)  produces 


... 

B<:  ’ ' '  ip? 


(2.7) 


(i). 


as  the  optimal  incentive  constant.  We  can  see  that  D(a  )  does  not  depend 

-(i)  -(i) 

explicitly  upon  both  R  and  S_  ,  but  merely  upon  their  ratio.  Thus,  the 

r  r 


leader  need  only  estimate  the  ratio 


i(1>  -  *<»/»«>. 


(2.8) 


With  this  notation  the  expression  for  the  optimal  incentive  constant  in  (2.7) 
reduces  to 


D(i(1))  -  -i(i)  4  • 

u 


(2.9) 


i 


2.4.  Parameter  Estimations  and  Updating  Techniques 

(0) 


The  leader  begins  his  estimation  of  x  ■  S_/R_  by  making  an  initial 

F  F  ' 


guess  x  .  Since  Sp  and  Rp  were  both  positive,  the  ratio  x  is  always 
positive  and  the  initial  guess  x^  can  be  constrained  to  positive  values. 
Using  the  leader  computes  D(x^)  via  (2.9)  and  his  input  structure 

u(0)  via  (2.3), and  presents  it  to  the  follower.  After  the  follower  computes 
v<°\  the  true  value  of  u^  is  obtained  and  both  players  may  compute  their 
costs  for  step  zero.  At  the  end  of  this  stage  the  leader  must  have  a 
method  for  updating  x^  which  incorporates  the  new  information  acquired 
during  stage  zero.  We  now  consider  the  formulation  of  a  general  updating 
procedure  to  be  used  following  any  stage  i. 
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To  update  x  we  first  calculate  the  follower's  actual  input 


v(i)  at  step  i,  with  u^  taking  the  form  given  in  (2.3)  and  D(a^)  as  in 


(2.9).  This  calculation  is  similar  to  the  one  in  equations  (2.4)  and  (2.5) 
and  yields 


,(i)  _  D(a(i))[D(a(i))vt-ut] 


[D(a(i))2  +  x] 


(2.10) 


We  would  like  to  update  x^^  so  that  converges  to  v^^  .  The  value  of 


was  given  in  (2.6),  and  by  our  selection  of  D(a^)  this  reduces  to 


*(i)  t 
v  =  v  . 


(2.11) 


Therefore,  we  actually  want  to  update  x^  in  such  a  way  that  v^  converges 


^  (i)  t 

to  v  -v  ,  the  optimal  follower's  input  from  the  leader's  viewpoint.  We 


have  devised  two  separate  methods  for  updating  x  ' . 


(i) 


2.4.1.  Error  function  method 


(i) 


Consider  the  following  function  of  x  ,  which  measures  the  error 


between  v^  and  v^ 


El(J(1))  -  i(v(1)-5(i))2  -  |(v(i)-vt)2. 


(2.12) 


Substituting  (2.9)  into  (2.10)  we  obtain 


.(i) 


-(i)2  vt  ,  -(i)  t 

x  — y+x  v  i  i  o 

t  -(i)  tJ  .  (i)  tZ  t 

_ U _  X _  V  +x  u  V 


...2  t 

J(1)  ^T+x 

t 

U 


-(i)2  t2  t: 
X  V  +  X  u 


(2.13) 


From  (2.13)  we  can  calculate 


IlijUl  I II  I  I  I 


m 


a- 


C*' 


IWWW! 
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(i)  t 

V  -V 


t  t.«(i)  . 

u  v  (x  /  — x) 


-d)2  c2  t: 

X  V  +  XU 


(2.14) 


Thus  for  t  0  and  vC  ^  0  we  have  *  vC  if  and  only  if  x^  =  x.  We  also 


find  by  differentiating  (2.14)  and  treating  x  as  a  constant 
(i) 


3v 


3x(1) 


32  222  223  2 

,,«( i)  t*  tw.(i)^  t*  t,  _.(i)  t,-(*)  tJ  .(i)  tZ  t. 
(2x  v  +u  v  )  (x  v  -Hcu  )-2x  v  (x  v  -hx  u  v  ) 


-(i)2  t2 


•  (2.15) 


(xN~'  v  +  x  u  ) 
:  (l)  ■ 


•  (i) 


Let  us  restrict  the  domain  of  E^(xv  to  positive  values  of  xw.  Note 

o.(i)- 


that  E^(x  y)  as  given  in  (2.12)  is  a  positive  definite  function  because 


E^(x^)  -0  only  when  x^  -  x,  and  E.(x^)  >0  at  all  other  values  of  x^. 


Our  goal  of  adjusting  x^  such  that  v^  converges  to  vfc  can  now  be 


(i) 


accomplished  by  adjusting  x^  such  that  E^(x^)  converges  to  zero.  We 


propose  to  update  x^  by  using  the  following  gradient  technique  on  E^(x^)  . 


x(1+1)  =  x(1)  -yV  .  E.  =  x(1)  -  y(v(i)-vt)V  .  v 
x(i)  1  £(i) 


(i) 


(2.16) 


.(i) 


At  the  end  of  each  stage  i,  the  leader  knows  the  value  of  the  input  v 
which  the  follower  has  just  applied,  and  can  easily  calculate  the  quantity 


v^-vC.  In  addition,  7~(i)V^  can  be  calculated  from  (2.15)  by  using  the 

x 

t  t  ~(i) 

known  quantities  v  and  u  ,  the  current  estimate  x  ,  and  by  substituting 


-(i) 


x  as  an  estimate  for  the  unknown  constant  x.  Thus,  the  leader  has  enough 


information  to  implement  the  updating  procedure  (2.16).  We  know  that  x  is 

i(i) 


always  positive  so  it  is  reasonable  to  require  that  x  always  remain 
positive.  This  can  be  accomplished  by  an  appropriate  choice  of  the  step 
size  y.  Our  updating  procedure  leads  to  the  following  theorem. 
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Theorem  1:  The  gradient  technique  for  updating  x^^  based  upon  the  error 
-  (i) 

function  E^(x  ),  applied  with  a  sufficiently  small  step  size  y  and  by 

using  x(i)  as  an  estimate  for  x  in  V  /j\v^  ,  produces  estimates  x^  which 

xv  J 

converge  to  the  actual  value  of  x. 

Proof ;  The  gradient  algorithm  for  updating  x^  is  given  by  equation  (2.16). 

Substituting  (2.14)  and  (2.15)  into  (2.16)  we  obtain  the  updating 


equation 

.(!«) 


RFvti(i)-sFvt 

t4  t  ^(i)2  t2  t3  .(i)  t2  t3 

u  v  x-x  u  v  +2xx  u  V 

.  ...  id)2 

SF+V  ~2 

uc 

t2  (i)2  t2  2 
(xu  +x  v  ) 

(2.17) 


The  actual  value  of  x  is  unknown  to  the  leader  so  we  substitute  our  estimate 

*  (i) 

x  for  x  in  (2.17)  to  produce 


~(i+l)  -(i) 

x  '  =  x  -  y 


_  t*(i)  _  t 

V  *  -sFv 

/.\2  t' 
;(i)  v 


w 


u  -* 


t4  t~(i) ~(i)2  t2  t3 
u  v  x  -He  u  v 

r(±)  t2  *(i)2  t2  2 
(x  U  +X  V  ) 


(2.18) 


From  (2.18)  we  see  that  the  only  equilibrium  point  for  our  updating  scheme 
(if  vt,ut^0)  occurs  when 


„  t-(i)  0  t 

V  X  "  SFV 


*>  x 


(i) 


S„ 

t 


x. 


(2.19) 


From  (2.19)  we  see  that  this  updating  method  has  the  desired  equilibrium 
point. 


«(i) 

To  prove  that  x  converges  to  x  we  consider  two  separate  cases: 


i) 

Case  1: 

If 

.(i)  % 
x  >  x . 

ii) 

Case  2: 

If 

*  (i) 

x  <  x . 
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For  simplicity  we  rewrite  (2.18)  as 


x (i+1>  =  x(l)  -YAx  where  Ax  = 


Rrvti;<1>-sFvt 

C\2  ^ 

SF+V<1>  \ 

u  J 


4  2  2  3 

u*  xCx(i)+x(i)  uc  vc 

r(i)  t2  -(i)2  t2  2 

(x  U  +X  V  ) 


(2.20) 

By  requiring  x^  to  be  positive  at  all  times,  and  since  R?  and  S?  are  positive 
constants,  we  see  that  the  denominator  of  Ax  is  always  positive.  Therefore, 
we  concentrate  our  interest  on  the  numerator  of  Ax  because  sign  Ax  =  sign  Ax^^. 
From  (2.20)  we  have 


ixNUM  =  ut  V^^’X)(ut  i(l)+vt  ). 


(2.21) 


Examination  of  the  case  >x 

In  this  case  we  want  to  show  that  Ax>0.  It  is  obvious  that  the 
first  two  terms  in  (2.21)  are  positive  because  Rj,  is  a  positive  constant 
and  x^-x  is  also  positive.  Furthermore,  the  third  term  of  (2.21)  is  also 
positive  by  our  previous  requirement  that  x^  be  positive  at  all  times. 

The  product  of  three  positive  quantities  is  always  positive  so  (2.21)  implies 

-(i)  -(i) 

that  Ax  is  positive  whenever  x  is  greater  than  x.  Thus,  whenever  x  is 

greater  than  x,  our  updating  procedure  (2.20)  will  produce  an  updated 

estimate  x^"*"^  which  is  less  than  x^  as  desired. 


"(i) 

Examination  of  the  case  x  <  x 

In  this  case  we  want  to  show  that  Ax<0.  Similarly  as  in  the 
previous  case  we  find  that  the  first  and  third  terms  of  (2.21)  are  positive. 
However,  since  x^  is  less  than  x,  the  second  term  in  (2.21),  x^-x,  is 
negative.  This  makes  the  expression  Ax  in  (2.21)  negative  whenever  x^  is 


less  than  x.  Therefore,  if  x^  <x  our  updating  procedure  (2.20)  will 

*(i+l)  ~(i) 

produce  an  updated  estimate  x  which  is  greater  than  x  as  desired. 

We  have  just  shown  that  if  x^  is  greater  than  (less  than)  x 
then  our  updating  procedure  (2.20)  produces  a  next  estimate  x^+^  which 


is  less  than  (greater  than)  x 


By  choosing  y  small  enough  such  that 


x^+^-x  has  the  same  sign  as  x^-x,  we  insure  that  the  differences 

|x^-x|  form  a  monotonically  decreasing  sequence  of  errors  which  converge 

to  its  lower  bound  of  zero.  Therefore,  for  a  properly  chosen  constant  y  the 

/  •  \ 

updating  scheme  (2.20)  produces  estimates  x''  '  which  converge  to  x.  This 
completes  the  proof. 

This  updating  scheme  produces  estimates  x^  which  display  a  one¬ 
sided  convergence  to  x,  with  the  direction  of  convergence  depending  upon 
the  selection  of  x^.  If  x^  >x  then  x^  converges  to  x  from  above  and 
if  i(0>  <x  then  x^  converges  to  x  from  below.  Note  that  when  x^ 
converges  to  x,  D(a^)  in  (2.7)  converges  to  the  optimal  incentive  constant 


(2.22) 


A  leader's  input  structure  of  the  form  (2.3)  using  (2.22)  will  generate  the 
optimal  input  pair  (ut,vt). 


2.4.2.  Gradient  on  JT  method 

L 

A  second  method  of  updating  x^  uses  a  gradient  technique  which 
is  based  upon  minimizing  J  .  Intuitively  this  method  is  the  most  logical 

Li 

-  (i) 

one  because  it  adjusts  x  in  a  manner  which  creates  the  greatest  decrease 


in  the  value  of  J  ,  which  is  precisely  the  overall  goal.  This  updating 

L 


scheme  is  given  by 


~(i+l)  - (i)  _  T 

x  =x  -YV  (i)JL. 


(2.23 


Using  (2.1a),  (2.3)  and  (2.9)  we  can  rewrite  J  as 

L 


JL  =  D(a(i))2RL(v(l)-vt)2  +  SL(v 


2  t 


<i>_vy  =  u<»‘ v,  + 


21'ltSL)(v  ~V  )  * 


(i)  v')2. 

(2.24) 


We  must  be  careful  when  calculating  V  ,  .  J  to  include  the  depen- 
dence  of  both  D(q^)  and  v^  upon  x^  .  By  the  chain  rule  we  have 


dJ, 


3Jl  av(i) 


=  M±)  =  -„(i)  ,.i(i) 


3JL  D(q(1)) 


dx 


3v  3x 


D(i(i))  3i(i) 


(2.25) 


From  (2.9)  and  (2.24)  we  can  find 


3D(q(l)) 


3x 


(i) 


t,  t 
-v  /u 


(2.26) 


3D(a(i)) 


2D(q(l))RL(v(l)-vt)2  -  -2x(l)  RL(v(i)-vt)  2 

u 


3v 


(i) 


2(v(i)-vt)[D(i(i))2RL+SL]  =  2(v(i)-vt) 


~(i)2  t2 

X  V 


(2.27) 


ST"  V*L 


i-  u 


(2.28) 


Substituting  (2.26)-(2.28)  into  (2.25)  we  can  write 


:£<1)JL  ’  2i<i>RL  ^  (v(1)-vt)2+2(v<i,-vt) 


2  2 
*(i)  t: 

X  v  R  .<5 

t2  Vt. 

-  u 


3v 


(i) 


3x 


(i) 


.  (2.29) 


At  the  end  of  each  stage  i,  the  values  of  the  inputs  v^  and  u^ 


are  known  so  all  of  the  quantities  in  (2. 26) -(2. 28)  can  easily  be  calculated 


is  found  by  differentiating  (2.13)  as 


by  the  leader.  The  quantity  — 

3x 

before,  and  treating  x  as  a  constant.  The  resulting  expression  is  the  same 
as  in  (2.15).  Once  again  since  the  actual  value  of  x  is  unknown  to  the 
leader,  he  replaces  x  in  (2.15)  by  his  estimate  x^  which  yields 


3v 


3x 


4  2  2“? 

(i)  _  u*  V V 

r(i)2  t2  -(i)  t2  2 
(x  V  +  X  U  ) 


(i) 


(2.30) 


/j\  ^ 

Using  this  expression  for  — jq-T  anc^  the  expression  for  v'1  -v  in  (2.14),  we 

3xU; 

write  (2.29)  as 


2i<1>R.  vt2At2vt^<1)-rtU 
^  -c±)2  c2_  t2 

u  \x  v  4-XU 


.  „  ,  ii(1)  ve  V uC  vt(;(i)-x)'\  (  nC  vtici)-ti(i)  ut2vt3 

I \+h^s7~7j \  r(i)2 12 -U) 

u  /'x  v  +xu  /X(x  v+x  u  ) 

2  2  2  2  2 
r_r(i)  t*  t  ,  t  *(!).« (i)  t  ... 

[2(x  -x)u  v  (u  x  +x  v  )] 


[RjV*  (i(i)-x)(x(i)  vC  ■+i(i)ut  )+(i(l)  vC  Rj+SjU*  )(£(i)  vC  -hoi*  )] 


r(i)2  t2  t2  2r(i)2  t2  «(i)  t2  2 
(x  V  +XU  )  (x  V  +x  u  ) 


(2.31) 


In  the  updating  equation  (2.23)  let  us  again  choose  y  sufficiently  small  such 
that  x(i)  remains  positive  at  all  stages  of  the  iteration.  This  leads  to 
the  following  theorem. 

Theorem  2:  For  an  initial  guess  x^  large  enough  such  that  x^  >  x,  the 
iterative  scheme  detailed  in  (2.23),  applied  by  using  an  appropriately  small 


step  size  y  and  by  substituting  as  an  estimate  for  x  in  the  expression 

3v^/3x^,  produces  estimates  x^  which  converge  to  the  true  value  of  x. 

Proof:  The  algorithm  for  updating  x^,  given  by  (2.23),  uses  V  which 

-  £(1)  L 

takes  the  form  of  (2.31). 

From  (2.31)  we  see  that  our  updating  scheme  has  equilibrium 
points  (if  v*",  uC  ^  0)  when 


2  2  2  2 
:(!)  ,  (i(1)  v‘  +xuC  l2 

t2..(i)2  t2  -<i)  t2, 
V  (x  v  +X  U  ) 


(2.32a) 


x(i)  =  x. 


(2.32b) 


Obviously,  (2.32b)  is  the  desired  equilibrium  point  while  (2.32a)  is  an 

undesirable  equilibrium  point  which  we  seek  to  avoid.  By  choosing  y  small 

enough  such  that  x^-x>0  for  all  i,  we  can  disregard  the  equilibrium 

point  (2.32a)  because  it  is  smaller  than  x. 

It  is  obvious  that  the  denominator  of  V  in  (2.31)  is 

*(i)  L 

positive,  so  we  concentrate  our  attention  on  the  numerator  because 


sign  7i(i)JL-si8n  7;(i)JL 


From  (2.31)  we  have 


222  22  2  22 
j  ro  tr  t  ,  t  -u)  -dr  t  w~(i)  .,r_  t  ,*(i)  w-(i)  t* 

J  *  [2u  v  (u  x  +x  v  )  (x  -x)J[R_v  (x  -x)  (x  v 

LvTTTVf  L 


x(i)  Sum 


(2.33) 


We  want  to  show  that  7  ,,.JT 

x(i)  h 


in  (2.33)  is  positive.  The 


difference  x^-x  is  always  greater  than  zero  by  our  choice  of  y  and  x^\ 


so  by  inspection, we  see  that  all  of  the  terms  in  (2.33)  are  positive.  This 


makes  both  7  ,...JT  and  7  ,  .  J  positive.  Therefore,  whenever  x  >x  our 


such  that 


updating  scheme  (2.23)  produces  an  updated  estimate 
x(i+1)  <  x(i)  as  desired.  This  completes  the  proof. 

Unfortunately,  if  x^  is  chosen  such  that  x^  <x,  then  we  can 
no  longer  guarantee  that  the  estimates  x^  produced  by  the  algorithm  in 
(2.23)  converge  to  the  actual  value  of  x.  We  can,  however,  prove  the 
following  theorem. 

Theorem  3:  For  an  initial  guess  x^  <x,the  iterative  scheme  of  (2.33), 
applied  with  an  appropriately  small  step  size  y  and  by  substituting  x^^  as 
an  estimate  for  x  in  the  expression  3v^/3x^,  produces  estimates  x^ 
which  converge  to  the  true  value  of  x  provided  x^  satisfies  the  following 
conditions 


;(0>-x  -  f«<0))  >  -  r 

t  ,.(0)  t  -(0)  t  , 
v  (x  v  -Hx  u  ) 


(2.34) 


»F(i(0h  x2»t\2x2x<°)vt2ut4-3i(0>V2ut4-2i<0)5vt6-2i<0>2xvt2utt 

J(0)  "  t\-(0)2  t2 ,  -  (0)  t2  2 

V  (x  V  +x  u  ) 


(2.35) 


Proof :  Once  again  the  updating  algorithm  (2.23)  uses  the  expression 


V^(i)JL  given  in  (2.31).  Consequently,  this  updating  scheme  has  the  same 
two  equilibrium  points  given  in  (2.32a,b).  The  denominator  of  7„,..J  in 

xu;  l 

(2.31)  is  positive  by  inspection  so  sign  7  ,,.J  = sign  7„,..J  .  This 

xu'  L  XU)  1^^ 

allows  us  to  concentrate  on  the  reduced  expression  in  (2.33). 

We  want  to  show  that  7  .  .  J,  in  (2.33)  is  negative.  However, 

X'1'  l*NUM 

this  situation  is  more  complicated  than  in  Theorem  2  because  we  must  also 
avoid  the  undesirable  equilibrium  point  given  in  (2.32a).  We  can  insure 


that  the  estimates  x^  ’  are  always  smaller  than  x  by  an  appropriate  choice 


© 


of  y •  This  restricts  the  first  term  in  (2.33)  to  negative  values. 

Therefore,  a  necessary  condition  for  7  ,  ,J,  to  be  negative  is  that  the 

io1'  wUM 

second  term  in  (2.33)  be  positive. 

Our  necessary  condition  is 


vt2(;<1)-x)(i(i)2vt2  +  J(i,ut2)  +  (i(1)V2  +  xuc2)2  >  0 


(2.36) 


which  reduces  to 


,..2  .2  2  2 

5<i>  >  (xU  v  +xu  ) 

A  -'A  «  0  0  O’ 

t'  >(i)  t:  -(i) 

V  (x  V  +  X  u  ) 


(2.37) 


The  expression  on  the  right  hand  side  of  (2.37)  is  precisely  the  value  of 
the  undesirable  equilibrium  point  given  in  (2.32a).  We  see  that  if  (2.37) 


holds 


then  <0  and  the  updating  scheme  in  (2.23)  will  produce  an 


which  is  greater  than  x^  as  desired.  We  now  seek  to  find  conditions 
on  x(0)  to  insure  that  x^  converges  to  x. 

Rewrite  (2.37)  as 


;(1>-x  >  -  <*(1)  > — --f<;(1)). 

t%^(i)^  t  ~(i)  t  , 

V  (x  V  +x  u  ) 


Now  differentiate  (2.38)  with  respect  to  x^  to  obtain 

r(i)  2  t6  2-  (i)  t2  t4  (i) 4  t2  t4  ~(i)5  t6  (i) 2  t2  t4 

?F(x  ^  xu  +2xx  vu  -3x  u  v  -2x  v  -2x  xv  u 


Notice  that  the  denominator  of  this  equation  is  always  positive 
-  (i) 

and  that  as  x  increases,  the  numerator  decreases.  Consequently,  if 


(2.38) 


(i)  -(i+1) 


1 — -  is  negative,  and  x  <  x  , 


-a  (i+D 


must  also  be 


negative.  Thus  for  a  negative  value  of  — ->-y  we  can  conclude  that  if 

a  (i^  ^x  ( *+1  ^ 

(2.38)  holds  for  x^  then  it  must  also  hold  for  x^1  '  since 


x(±+1)-x  >  x(l)-x  and  F(x(l+1))  <  F(x(l)). 


(2.40) 


Combining  these  results,  we  can  guarantee  that  whenever  x  '  <x  our 
algorithm  (2.23),  applied  as  stated  in  Theorem  3,  will  converge  to  x  if 


both  equation  (2.38)  holds  and 


,  as  given  in  (2.39),  is  negative. 


OA  /  ,  \  /  »  ,  i  \ 

If  we  select  y  small  enough  such  that  x  -x  and  x''  -x  have  the  same 

~(0) 

sign,  then  we  may  generalize  the  above  conditions  to  x  .  Therefore, 
if  (2.38)  holds  for  x^  and  (2.39)  is  negative  for  then  our  subsequent 

estimates  x^  will  converge  to  x.  These  are  precisely  the  sufficiency 
conditions  given  in  the  statement  of  the  theorem  and  thus  the  theorem  is 


proved. 


It  is  interesting  to  note  that  the  sufficiency  condition  (2.34) 


of  Theorem  3  explicitly  requires  that  the  initial  guess  be  larger  than  the 

undesirable  equilibrium  point  given  by  (2.32a).  In  addition,  the 

3F(x(0) ) 

sufficiency  condition  (2.35)  which  states  that  - ^ — -<0  is  somewhat 

9x 

analogous  to  insuring  a  locally  convex  function  for  our  gradient  technique. 
However,  the  leader  has  no  a  priori  information  about  the  value  of  x  and 
thus  has  no  way  of  knowing  whether  his  initial  guess  will  satisfy  the 
sufficiency  conditions  of  Theorem  3.  Therefore,  in  applications  it  is  more 
logical  for  the  leader  to  select  a  somewhat  larger  value  for  his  initial 
guess  x^,  since  Theorem  2  guarantees  convergence  to  the  actual  value  of  x 
from  above.  In  general,  the  convergence,  if  it  occurs,  will  be  one-sided 
and  will  depend  upon  whether  x^  is  greater  than  or  less  than  x.  If  x^ 


does  converge  to  x  we  are  again  assured  that  D(a  ')  will  converge  to  the 
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CHAPTER  3 

HIGHER  ORDER  STACKELBERG  GAMES  WITH  UNKNOWN  COST  FUNCTIONALS 

3.1«  Introduction 

Chapter  2  developed  methods  for  finding  an  optimal  incentive 
control  for  scalar,  static,  two-player  Stackelberg  games  with  unknown  cost 
functionals.  We  now  seek  to  develop  corresponding  methods  for  similar 
problems  of  higher  dimension.  In  this  chapter  we  concentrate  on  a  second 
order  game  and  again  adopt  a  basic  incentive  input  control  structure  for  the 
leader.  In  this  case  we  no  longer  have  an  incentive  constant,  but  rather  a 
2x2  incentive  matrix  which  provides  great  flexibility.  The  leader  is  free 
to  select  the  structure  of  this  incentive  matrix,  the  elements  of  which 
depend  upon  his  estimates  of  the  unknown  cost  functionals.  The  resulting 
optimal  incentive  matrix  may  or  may  not  be  unique,  depending  upon  the 
structure  chosen  by  the  leader.  Generally,  the  leader  may  use  his  freedom 
in  selecting  the  incentive  matrix  to  satisfy  specific  design  considerations 
as  described  at  the  end  of  the  chapter. 

Paralleling  the  scalar  case,  it  is  again  unnecessary  to  precisely 
estimate  the  actual  values  of  the  unknown  system  parameters.  It  suffices  to 
obtain  enough  information  about  these  parameters  to  find  the  elements  of  the 
optimal  incentive  matrix.  The  parameter  estimate  updating  method  utilizing 
a  gradient  technique  on  J  provides  good  results  and  its  details  are 
described  in  Section  3.4. 


1 

I 


1 


3.2.  Problem  Formulation 


We  now  consider  a  general  problem  similar  to  that  of  Chapter  2 
but  with  higher  dimensions.  Assume  that  the  control  inputs  are  given  by 
the  2x1  vectors 


u  = 

*ul' 

! 

for  the  leader 

AND 

v  = 

i'vll 

for  the  follower 

L  u2 

- f 

eg 

> 

Each  player  again  tries  to  minimize  at  each  stage  his  own  quadratic  cost 
functional  given  by 

JL  =  (u-uC)  ,R^(u-ut)  +  (v-v1")  '  S  (v-v^  for  the  leader  (3.1 

Jp  =  u'RFu  +  v'SpV  for  the  follower  (3.1 


where  R  ,  S  ,  R  ,  and  3_  are  all  symmetric,  positive  definite  2x2  matrices. 
L  L  r  r 

Thus,  Jp  and  are  always  >0  and  from  the  leader's  viewpoint  an  optimal 
solution  is  given  by  the  input  control  pair 

t 


t  ! 
u  =  j 


u 


AND 

t 

t 

V1 

vC  = 

L 

V2 

We  assume  that  the  information  structure  of  this  problem  and 
the  steps  followed  in  playing  the  game  are  the  same  as  described  in 
Section  2.2.  At  the  end  of  each  stage  the  leader  will  use  the  new  infor¬ 
mation  obtained  during  that  stage  to  update  his  parameter  estimates  and  the 
next  stage  of  the  game  will  begin.  The  players  will  continue  to  play  until 
an  equilibrium  point  is  reached  as  previously  defined. 
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3.3.  Construction  of  an  Optimal  Incentive  Control 


The  weighting  matrices  R  and  S  of  the  follower  s  cost  functional 

F  F 

*  (i) 

are  unknown  and  thus  the  leader  will  attempt  to  estimate  them.  Let  R^;  and 

denote  the  leader's  estimates  of  the  matrices  R  and  S_  at  stage  i. 

There  are  actually  only  five  unknown  elements  in  the  matrices  R  and  S_ 

since  they  are  both  symmetric  and  can  be  scaled  such  that  R  j»l,  Let  us 

stack  the  unknown  elements  of  R  and  S_  into  a  vector  a.  Then  is  the 

F  r 

-  (i)  *(i)  * (i)  * (i)  -m 

stacked  vector  representing  the  estimates  R^  »  Rp22*  SF11’  SF12’  and  SF22 

of  these  unknown  elements  at  stage  i. 


Consider  an  input  of  the  form 

u(i)  -  uC  +  DCa^Mv^-v*)  (3.2) 

r6(i)  d^)_ 

where  D(a^)  is  now  a  2x2  matrix.  D(o^)  * 

-°21  °22  _ 

From  (3.2)  we  see  that  in  general  each  of  the  leader's  inputs  u^  and  u^ 
depend  upon  both  of  the  follower's  input  deviations  from  the  desired  input. 
The  leader  may  now  use  the  follower's  estimated  cost  functional 


Jpl}  -  u'^u  +  v'Spi)v 


(3.3) 


•  A  I  1  I 

to  calculate  an  estimate  of  the  follower  s  response  v  to  the  incentive 
input  structure  u^  of  (3.2).  This  is  done  by  finding  the  vector  v ^ 

-(i)  .(i) 

which  minimizes  J^;  .  From  basic  calculus  and  the  chain  rule,  v  is  a 

solution  to 


’$1 

’M 


We  may  solve  (3.4)  using  (3.2)  and  (3.3)  to  obtain 


.<»  -  [tv‘  ■»(£<«>  *«>  '«(«<«)  'r<»  'd<;(1) 


The  leader  wants  to  choose  D(ct^)  such  that  =  v*” .  With  this  equality 


(3.5)  reduces  to 


-Vr^  'D(a(l))  -  v^S^  ' . 

r  r 


Expanding  the  matrices  in  (3.6)  we  obtain 

1(D11  RF12D21  ’  2CRF12D11  +  RF22D21  ‘  1SF11  2712  (3,7 

l(D12  RF12D22  7  u2(RF12D12  RF22D22  )  ~  V1SF12+  2SF22  (3,7 

which  reveals  a  system  of  two  equations  and  four  unknowns  ,D^ »^2I^ ’ 

f*22  ^  ’  Hence  our  solution  for  D(a^^)  is  certainly  non-unique  and  has  two 
degrees  of  freedom  [18]. 

To  insure  a  unique  solution  we  suppose  the  leader  chooses  D(a^^ 
to  be  a  diagonal  matrix.  That  is. 


D(J(l>) 


L  o  ^’J 


Then  (3.7a,b)  reduces  to 


f-ut-utP(i))d(1)  =  vtS(i)  +  vtS(i) 

(  1  2RF12)D11  v1SF11  +  2SF12 

.  t '  (i)  ,tp  (i)\n(i)  _  -  ( i>  t>(i) 

-U1RF12_U2RF22  ^22  V1^F12  +  V2^F22 


and  we  have 


1 

I 


(3.10a) 


D 


11 


F1Z 


V1SF12  +V2SF22 

“Xu+U^22 


(3.10b) 


as  the  elements  of  the  incentive  matrix  D(a^).  The  optimal  diagonal 
incentive  matrix  elements  D*^  and  D*2  can  be  found  from  (3.10a,b)  by  using 
the  actual  values  of  Rpl2,  R^,  Spn,  Spl2,  and  Sp22. 


3.4.  Parameter  Estimations  and  Updating  Techniques 

The  leader  begins  his  estimation  of  the  unknown  parameters  in  a 

.(0) 

by  making  an  initial  guess  a  .  Since  we  know  that  Rp  and  Sp  are  positive 

definite  matrices  it  is  advisable  to  select  this  initial  guess  ci^  such 

'(0)  -(0) 

that  the  resulting  matrices,  Rp  and  Sp  ,  are  both  positive  definite.  The 
leader  may  use  these  parameter  estimates  to  calculate  his  input  structure 
u<°>  via  (3.2),  (3.8),  and  (3.10a,b)  and  present  it  to  the  follower.  The 
follower  will  optimize  his  cost  functional  (3.1b)  with  respect  to  this 
leader's  input  structure  and  present  his  input  v^  .  The  subsequent  value 
of  u(0)  may  be  obtained  from  (3.2)  and  each  player  may  then  compute  his  cost 
for  stage  zero.  At  this  point  the  leader  wishes  to  have  a  method  for 
updating  his  parameter  estimates  before  beginning  stage  one  of  the  game. 

We  now  consider  the  formulation  of  a  general  updating  procedure  which  can 
be  used  following  any  stage  i. 


The  follower's  true  input  v  at  any  stage  i  can  be  found  by 
optimizing  J  in  (3.1b)  with  respect  to  and  u^  as  given  in  (3.2). 

r 

By  the  chain  rule  is  a  solution  to 


8JF  ,  3JF  3u(i)  . 

3v(i)  3u(i)  3v(i) 


(3.11) 


which  leads  to 


-  (-ut'+vt,D(5(i)),)R^D(S(i))(Sp  +  D(a(i)),R^D(a(i)))  l.  (3.12) 


This  value  of  v^  will  produce  a  corresponding  leader's  input 
u^  according  to  (3.2).  Once  these  values  are  obtained,  the  leader  may 
calculate  his  cost  for  step  i.  At  this  point  the  leader  wishes  to  update 
his  estimate  ci^  by  making  use  of  the  information  gained  during  step  i. 
Corresponding  to  the  first  order  (scalar)  case, this  can  be  done  by  using 
a  gradient  technique  based  upon  minimizing  J  . 

L 

Since  there  are  five  estimated  parameters  in  a^,  the  leader  must 
have  five  separate  equations  for  updating  his  estimate.  In  general,  we 
write 


~(i+l)  *(i)  _  _ 

a  =a  -Y7~(i)JL 

a 


(3.13) 


which  is,  in  fact,  the  five  equations 


R(i+1) 

F12 


rU+U 

r22 


Ai+D 

bFll 


-  y7  J 

KF12  Y  ^(i)JL 
rl2 

R(i)  7  T 

r  22  ”  *  (i)  L 
rF22 

“  Y,j(i)JL 
r  1 1 


(3.14a) 


(3.14b) 


(3.14c) 
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$U+1) 

&F12 


;(i+l) 


-  vV  J 
SF12  YV5(i)JL 

bF12 

•b2-*Wi- 

bF22 


(3. 14d) 


(3. 14e) 


Expanding  the  matrix  equations  (3.1a)  and  (3.12),  the  leader  may  obtain  the 
following  expressions  for  and  in  terms  of  known  quantities,  the 

A  (  i)  ■  A  (  j.) 

parameter  estimates  a  ,  and  the  unknown  parameters  a.  (Recall  that 

*(i)  -(i) 

and  1*22  were  both  functions  of  known  quantities  and  the  estimates  a  .) 


6U)2RUl<vi1>-''!>2+25n)“22\l2<''l<l>-''t)<v21>-''2)«22)2itL22(i’2i)-v2>2 


+  SL11^V1  ^_V1^  +2SL12^V1  >'VI) ^V2  ^~V2^+SL22^V2  ^~v2^ 


(3.15) 


[D<«(vJd<»-uJ)«12D<»(v‘d<21)-u')HS22«22  > 

-[Rl2  R12> 

CSa+D<»2HS22«22  S22)2>'(S12+5n>  R12)2 


(3.16a) 


[R12  ®22)(vi®n)"ui)+R'»'»  5^(v^^-<)l(s„+^t;  ) 


1  (vtD^^-ut) 

22  U22  ^  2  22  2J 


’22+Dll 


"l6li)(Vl6ll)"Ul)‘W12  °1<1)(V2622)_U2)](S12+°1(1)  ^  *12^ m  (3*16b) 
)(S22+R22  °22^  )  -  (si2+fiii)  ^22^  R12* 


Using  the  chain  rule  and  equations  (3.10),  (3.15),  and  (3.16a,b), 

we  can  find  the  quantities  7^^,  . 7,^,  through  a  lengthy,  but 

RF12  RF22  SF22 

straightforward  process.  For  example. 


‘■r *,v<; 


Wi: 


3RF12 


3RF12 


3' 


3Sl2 


(i)T 


where 


a6n  . 

“22’ 

>Kn  an 

3Jl 

( 

_ _  j 

3Jl  ^ 

3v2i}  3*m 


are  found  by  differentiating  (3.10a,b),  and 


(3.17) 


-(i)  * 
3D11 


- 77T  »  - •  an<l  —  — ttt  ate  JLUuiiu  uy  uiij.eiciiLi.atiug  iuc 

3°22  3V1±)  3v2  5 

equations  for  and  in  (3.16a,b)  contain  the  unknown  parameters  in 

3V*1*  3v^i) 

a  so  we  must  obtain  — jrr-  and  — prr  by  differentiating  (3.16a,b)  and  treating 

3*F12  3  f12 

these  five  unknown  parameters  as  constants.  However,  our  resulting  expres- 


are  found  by  differentiating  (3.15).  The 


sions  for  — t rr  and  — rrr- 


will  still  contain  these  unknown  parameters  so 


we  need  to  replace  them  with  our  corresponding  parameter  estimates.  After 

making  this  substitution,  all  of  the  quantities  on  the  right  hand  side  of 

(3.17)  will  be  known  and  we  can  calculate  v  • 

RF12 

Calculations  of  the  gradient  of  J  with  respect  to  each  of  the 

1j 

other  four  estimated  parameters  are  similar.  Once  these  values  are  computed, 
the  leader  may  use  (3.14a-e)  to  update  his  parameter  estimates  and  obtain 
which  are  used  to  begin  the  next  stage  i+1  in  the  game. 
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3.5.  General  Aspects  of  the  Higher  Order  Problem 

In  this  section  we  will  attempt  to  address  a  few  general  questions 
pertaining  to  the  higher  order  problem  we  have  just  studied.  We  will  assess 
the  performance  of  our  algorithm  in  solving  actual  numerical  problems, 
and  discuss  some  implications  stemming  from  the  incentive  matrix 
flexibility. 


3.5.1.  Implications  of  the  incentive  matrix  flexibility 

From  equations  (3.7a,b)  we  can  see  that  there  are  only  two 
equations  governing  the  four  elements  of  the  2x2  incentive  matrix. 
Consequently,  the  optimal  incentive  matrix  D(a^)  is  non-unique  and  has  two 
degrees  of  freedom.  A  closer  inspection  of  (3.7a,b)  reveals  that  and 

~  (l)  /v  (i)  ~  ( i) 

D21  are  isolated  in  one  equation  while  and  D22  are  isolated  in  the 

other.  This  decoupling  allows  us  to  solve  for  the  columns  of  the  incentive 
matrix  independently  of  each  other  [18], 

If  we  allow  the  incentive  matrix  to  be  a  general  2x2  matrix,  then 
each  of  the  leader's  inputs  u^  and  u2  will  depend  on  both  of  the  follower's 
inputs  Vj  and  However,  if  the  incentive  matrix  is  restricted  to  a 

diagonal  structure, as  it  was  throughout  this  chapter,  then  there  is  a  slight 
partitioning  of  the  control  space.  In  this  case  u^  depends  only  upon  v^ 
and  u2  depends  only  upon  v2» 

Although  we  selected  a  diagonal  incentive  matrix  for  simplicity 
and  to  insure  a  unique  solution,  there  are  certainly  other  meaningful  ways 
to  exercise  the  two  degrees  of  freedom.  One  of  the  most  useful  methods 
would  be  to  select  the  optimal  incentive  matrix  in  a  manner  such  that  the 
optimal  incentive  control  is  robust  to  parameter  estimation  errors  [15]. 


Specifically,  the  leader  would  like  to  minimize  the  sensitivity  of  his 

~  (i) 

cost  functional  J  to  variations  in  the  parameter  estimates  a  .  This 
Li 

method  has  been  studied  for  the  deterministic  problem  [11],  and  its  ideas 
should  be  extendable  to  the  problem  at  han<*. 

To  minimize  the  estimation  procedure,  we  also  investigated  the 
possibility  of  limiting  our  estimations  strictly  to  the  elements  of  the 
incentive  matrix  D.  This  approach  was  appealing  because  in  the  case  of  a 
diagonal  incentive  matrix  it  reduced  the  number  of  elements  of  a  from  five 

,  &p22*  ^Fll*  ^F12*  and  ^F22^  t0  tW0  and  ^22)  *  Even  with  the 

most  general  form  of  the  incentive  matrix  the  number  of  elements  of  a  is 

reduced  from  five  to  four.  Although  the  concept  seemed  promising,  we  were 

unable  to  find  a  suitable  method  for  directly  updating  the  estimates  of 

and  D22  due  to  a  lack  of  information  about  the  unknown  and  unestimated  system 

parameters.  Therefore,  we  adopted  our  present  approach  of  attempting  to 

estimate  all  of  the  unknown  system  parameters,  and  then  constructing  the 

optimal  incentive  matrix  at  each  stage  as  a  function  of  these  estimates. 

3.5.2.  Performance  analysis  of  the  algorithm 

Unlike  the  scalar  case,  we  have  thus  far  been  unable  to  analytically 
prove  that  the  iterative  method  described  in  this  chapter  will  indeed 
produce  a  sequence  of  incentive  control  structures  which  converge  to  the 
optimal  incentive  control.  Nevertheless,  extensive  simulation  studies  utilizing 
this  iterative  method  have  produced  some  good  convergence  results. 

Since  our  parameter  estimates  are  updated  by  using  a  gradient 
method  on  JT ,  we  expect  the  value  of  J  to  decrease  at  each  successive 

Li  Li 


stage  of  the  game.  We  also  expect  to  converge  to  its  optimal  value  of 


zero  unless  it  settles  at  a  local  minimum.  Unfortunately,  we  must  remark 


that  we  cannot  be  totally  assured  of  a  monotonically  decreasing  J  ,  because 

Li 


strictly  speaking,  our  updating  technique  is  not  a  true  gradient  technique. 


9v 


(i) 


3v 


(i) 


Recall  that  our  calculation  of  expressions  such  as 


9R. 


(i) 

F12 


and 


*  (i) 
9RF12 


used  in 


the  updating  technique  involved  some  of  the  unknown  system  parameters  a. 

This  prevented  us  from  calculating  expressions  which  were  crucial  to  the 

updating  process.  To  circumvent  this  problem, we  replaced  the  unknown  para- 

-  (i) 

meters  a  with  their  most  recent  estimates  a  ,  enabling  us  to  calculate 
the  necessary  expressions  and  continue  our  iterative  process.  This  sub- 

a.  (  2^  ~  f  2^ 

stitution  is  of  small  consequence  provided  the  values  of  and  D ^ 

computed  from  the  estimates  are  close  to  their  optimal  values.  It  does 

create  greater  difficulties  in  cases  where  the  computed  values  of  and 

^  Ci) 

D^2  are  less  accurate  estimates  of  their  optimal  values. 

Our  simulation  studies  appear  to  confirm  the  "near-gradient" 
nature  of  our  updating  scheme.  For  a  relatively  good  initial  guess  a ^ , 

-  (0)  a  (0) 

i.e.,  a  guess  such  that  and  ^22  are  "reasonably  close"  to  their  optimal 

values  D*^  and  0*2’  our  sc^eme  does  indeed  behave  similar  to  a  gradient 
technique  and  the  resulting  leader's  cost  J  decreases  monotonically  to 
zero.  However,  for  a  less  fortunate  initial  guess  ct^,  our  scheme 
deviates  from  the  behavior  of  a  gradient  technique  and  does  not  produce  a 
monotonically  decreasing  J  .  In  some  of  these  cases  our  scheme  actually 

L 

exhibits  a  behavior  characteristic  of  dual  control  [12]  by  sacrificing  the 
minimization  of  J  over  an  initial  iterative  period  in  order  to  gain  better 

Li 

estimates  of  the  unknown  parameters.  Once  the  scheme  achieves  parameter 

-  (i)  -  (i) 

estimates  which  provide  sufficiently  accurate  estimates  of  and  D,^  > 


it  then  proceeds  to  generate  control  inputs  which  result  in  a  monotonically 
decreasing  J^.  An  example  of  this  phenomenon  is  illustrated  in  Figure  3.1. 
From  this  figure  we  can  see  that  the  leader's  cost  actually  increases  for 
approximately  the  first  ten  iterations  before  beginning  its  monotonic  decent 
towards  zero. 

Simulation  studies  have  also  revealed  that  our  scheme  will  not 
necessarily  provide  accurate  estimates  for  the  actual  values  of  a.  In 

cases  where  our  scheme  successfully  generates  the  optimal  incentive  control, 

the  leader's  cost  functional  approaches  zero  and  the  generated  values 

“(i)  *  * 

and  D^2  approach  their  optimal  values  D*^  and  D*2»  respectively. 

Consequently,  the  follower's  actual  inputs  vj^  and  also  approach  their 

desired  values  of  v^  and  v^ .  In  general  though,  the  estimates  do  not 

necessarily  approach  a.  In  fact,  the  estimates  of  some  elements  of  a  may 

converge  to  values  far  from  their  actual  value.  From  this  we  can  see  that 

our  scheme  is  not  actually  attempting  to  estimate  a,  but  merely  uses  the 
^  (i) 

estimates  a  as  a  vehicle  through  which  it  can  compute  and  adjust  the  estima 
tions  and  •  Thus,  although  we  were  unable  to  accurately  estimate 

the  elements  of  the  optimal  diagonal  incentive  matrix  by  themselves,  as 
discussed  in  the  previous  section,  we  are  able  to  accurately  estimate  these 
elements  by  expressing  them  as  functions  of  estimates  of  the  unknown  system 
parameters. 

The  selection  of  the  step  size  y  is  crucial  in  applying  our  itera¬ 
tive  scheme.  It  must  be  chosen  small  enough  to  avoid  large  overshoots  in 
the  parameter  estimations,  but  large  enough  to  provide  a  sufficient  adjust¬ 
ment  in  the  incentive  matrix.  This  will  insure  an  adequate  rate  of 
convergence.  In  the  event  that  the  scheme  becomes  bogged  down  at  an 


effect"  behavior  of  our  algorithm 


undesired  local  minimum,  the  authors  propose  restarting  the  algorithm  with 

.(0) 

an  entirely  new  initial  guess  a  .  No  recommendations  are  made  regarding 
restarting  the  algorithm  with  an  initial  guess  modified  from  the 

previously  unsuccessful  initial  guess. 

3.5.3.  Generalization  of  the  algorithm  to  higher  dimensional  problems 

In  this  chapter  we  have  developed  an  algorithm  which  iteratively 

computes  incentive  input  controls  for  the  leader  in  a  static,  second 

order  Stackelberg  game  with  unknown  cost  functionals.  This  algorithm  was 

developed  without  using  any  inherent  properties  of  the  second  order  problem 

leading  us  to  believe  that  it  can  be  extended  to  a  general  n-th  order 

problem.  In  a  general  n-th  order  problem, equations  (3.7a,b)  will  represent 

2 

a  system  of  n  equations  and  n  unknowns.  This  results  in  n(n-l)  degrees  of 
freedom  in  selecting  the  optimal  incentive  matrix.  We  can  once  again  be 
assured  of  a  unique  solution  by  considering  only  diagonal  incentive 
matrices.  Furthermore,  we  believe  that  this  solution  can  be  computed  by 
applying  a  generalized  algorithm  similar  to  the  one  described  in  this 
chapter. 


CHAPTER  4 


SIMULATION  EXAMPLES 

In  the  previous  two  chapters  we  have  discussed  methods  for 
deriving  optimal  incentive  controls  for  two-player  Stackelberg  games  with 
unknown  cost  functionals.  We  now  demonstrate  the  results  of  these 
methods  via  a  few  numerical  examples.  The  first  example  is  a  realistic 
economic  problem  which  uses  the  theory  of  Chapter  2  and  implements  the 
error  function  method  for  updating  parameter  estimations.  The  second 
example  demonstrates  the  algorithm  described  in  Chapter  3  applied  to  a  second 
order  problem. 


4.1.  A  Scalar  Economic  Example 


Consider  the  following  economic  problem  [20,21]  illustrated  in 


Figure  4.1. 

A  monopoly  M  operates  in  a  market  with  a  demand  curve  specified 
by  p * A^-A£q  and  with  a  flat  marginal  cost  curve  MC * C  dollars/unit.  The 
government,  which  does  not  know  the  value  of  the  parameters  A^  or  A^,  wishes 
to  regulate  this  monopoly  in  such  a  way  that  its  production  output  will  be 
equal  to  q*,  the  same  quantity  which  would  be  produced  in  a  purely  competi¬ 
tive  market.  We  do  assume,  through  market  surveys  or  estimates,  that  the 
government  does  have  knowledge  of  the  current  operating  point  (ci^tP  ) •  The 
government  regulation  may  be  either  a  tax  or  a  subsidy,  and  can  be  applied 
in  either  a  lump  sum  or  per  quantity  method. 

Let  p  ■  price  and  q« quantity  produced.  We  have  a  demand  curve 


specified  by 


Current  Operating  Point 


Quantity 


.  Government  regulation  of  a  monopoly. 


By  economic  theory  [21]  we  know  that  the  marginal  revenue  (MR)  curve  has  the 
same  intercept  and  twice  the  slope  of  the  demand  curve.  Thus, 


MR  =  -  2A^q . 


(4.2) 


Since  we  know  that  C » Pm)  must  be  a  point  on  the  demand  curve, we  can 
express  A^  as  a  function  of  A2  and  thus  eliminate  one  of  the  unknown 
parameters.  From  (4.1)  we  have 


A1  '  pm  +  A2V 


(4.3) 


We  know  that  the  monopolist  will  produce  at  a  level  where  MR  =  MC.  Let  us 
assume  that  the  government  decides  to  provide  a  subsidy  of  S  dollars  per 
unit  produced.  If  A2  were  known,  then 

MR  *  MC  =>  A^  +  S  -  2A2q*  =  Aj  -  A2q*  =?>  S  «  A2q*  (4 . 4) 

would  be  the  optimum  subsidy.  However,  A2  is  unknown  so  the  government  will 
attempt  to  estimate  it  with  A^  .  Consider  the  incentive  structured  subsidy 


S 


+  D(A2'l) )  (q-q*)  . 


(4.5) 


Given  this  subsidy, the  monopolist  will  still  produce  at  the  level  of  ouptut 
which  maximizes  his  profit.  To  estimate  this  level  of  production  q^,  the 
government  solves  (using  A2  as  an  estimate  for  A2) 


dJ 


(i) 


(4.6) 


where 

-  q(i)  (PRICE-COST)  =  q (i)  (A^  +  Pm-A2q  (1)  ) 


(4.7) 


is  the  monopolist's  total  profit.  Solving  (4.6)  by  using  (4.7)  we  obtain 


2(D(A2±^)  “ 


(4.8) 


as  an  estimate  for  the  monopolist's  production  level.  Setting  q^  in  (4.8) 
equal  to  the  desired  output  q*,  the  government  calculates  its  optimal  incen¬ 
tive  constant  to  be 


m  (q*-q  )  +  C  -  p 

„aa>,  .  _2 - - ■ 

l  q 


(4.9) 


However,  since  our  estimate  A ^  of  is  probably  incorrect,  the  monopolist's 
true  production  level  will  probably  differ  from  q*.  Solving  (4.6)  and  (4.7) 
with  the  actual  value  of  we  find  that  the  monopolist  actually  produces 


(1)  . 

1  '  <C'Pm  +  A2(q*“qm>  ■A2q’) 


units. 


(4.10) 


Now  consider  the  positive  definite  error  function 

F  .  I  /  (i) 

Ei  7  ~q  )  ’ 


(4.11) 


The  government  may  update  its  estimates  for  A^  by  using  a  gradient  method 


A2i+1)  -  ^  “YV:(i)Ei  -  A<i}  -  Y(q(i)-q*)V  q(i). 

A2  A2 


(4.12) 


Differentiating  (4.10)  and  using  as  an  estimate  for  A^,  we  have 


>>" 

a2 


(4.13) 


which  is  substituted  into  (4.12)  to  generate  the  final  form  of  the  updating 


equation 


In  this  example  let  our  variables  take  on  the  following  numerical  values 


A1  -  $450 
A2  *  $1. 50/unit 
*  130  units 
Pm  *  $255/unit 
p*  ■  cost  C  =  $60/unit 
q*  *  260  units 

Y  =  .00001 
sigma  *  3.0. 


Then  the  updating  equation  (4.14)  takes  the  form 


-  i<«  ♦  T(,<»-*0)  . 260  ...  ■ 

y -195- I3OA2  / 


Assume  that  the  government  makes  an  initial  guess  of  A^  =  .65.  By  simula¬ 
tion  with  a  small  amount  (35dB  S/N  ratio)  of  noise,  we  obtain  the  results 
illustrated  in  Figures  4. 2-4. 4.  Figure  4.2  illustrates  the  estimate  A^^ 
at  each  stage  i.  In  this  case  A^^  converges  to  the  actual  value  of  A^ 
rather  quickly  (in  approximately  twenty  iterations) .  Using  these  estimates 
A^^  to  implement  the  incentive  structured  subsidy  computed  from  (4,5)  and 
(4.9),  the  resulting  quantity  produced  and  market  price  at  each  stage  i  are 
displayed  in  Figures  4.3  and  4.4,  respectively. 


Figure  4.3.  Time  response  of  the  quantity  of  goods  produced  q 


rxrsnrwi:  vrrv  urw  t'v  vna  im  w 


A  Second  Order  Example 


Consider  the  following  second  order  numerical  example 
JL  =  (u-uC)  R^u-u*1)  +  (v-v*")  'sL(v-vt) 


J  =  u'R^u+v'S  v 


(4.16a) 


(4.16b) 


where 


is  the  leader's  2x1  input  vector 


is  the  follower's  2x1  input  vector. 


Suppose  we  have  the  following  values 


1.0  .63 


.63  1.8 


1.0  .75 


,75  1.2 


.86  1.1 


1.1  2.5 


.75  .65 


.65  1.6 


(4.171 


(4.18) 


(4.19) 


and  the  leader,  who  does  not  know  the  contents  of  the  matrices  R^  and  Sp, 


wishes  to  apply  an  optimal  incentive  input 


u(i)  =  ufc  +  D(a(i))  (v^-vS 


(4.20) 


where  D(av  ')  is  a  diagonal  matrix. 


From  equations  (3.10a,b)  and  (4.19)  we  find  the  optimal  diagonal 
incentive  matrix  elements  to  be 


Thus, 


* 

D  = 


-.8603  0 

0  -1.417 


is  the  optimal  diagonal  incentive  matrix.  Using 


the  algorithm  described  in  Chapter  3  along  with  the  initial  guesses 


1.0 

1.1 

II 

o 
^  1 

"1.5 

.8 

1 . 1 

2.0 

F 

.8 

1.0 

(4.22) 


and  a  step  size  y=  .002,  and  applying  at  each  stage  i  the  incentive 
control  calculated  from  (3.2),  (3.8),  and  (3.10),  we  obtained  the  simula¬ 
tion  results  displayed  in  Figures  4.5 -4.9.  These  results  include  a  small 
amount  (45  dB  signal/noise  ratio)  of  noise.  Figure  4.5  displays  the  leader's 

cost  incurred  at  each  stage,  while  Figures  4. 6-4. 7  illustrate  the 

-(i)  -  (i) 

values  of  and  calculated  at  each  stage.  We  can  see  that 

»(i)  -(i) 

approaches  its  optimal  value  of  zero  and  and  each  approach  their 

optimal  values.  The  resulting  components  v^  and  v^^  of  the  follower's 
input  are  displayed  in  Figures  4.8  and  4.9,  respectively.  These  plots  also 


t  t 

approach  their  desired  values  of  v^  and 


/V'- 


response  of  the  incentive  matrix  element 


response  of  the  follower's  input 


CHAPTER  5 


SUMMARY  AND  CONCLUDING  REMARKS 


In  this  thesis  we  hev^_used  the  certainty  equivalence  approach 
and  the  theory  of  self-tuning  regulators  to  derive  an  iterative  method 

which  generates  an  optimal  incentive  control  for  the  leader  in  a  static, 

■  i-.  £ 

two-player  Stackelberg  game  with  unknown  cost  functionals.  method  uses 

all  available  degrees  of  freedom  to  restrict  the  incentive  matrix  to  a 
diagonal  structure.  This  restriction  assures  the  leader  of  a  unique  optimal 
incentive  control.  Convergence  to  the  optimal  incentive  control  has  been 


proven  for  the  scalar  problem  and  simulation  studies  have  shown  good  conver- 


gence  results  for  the  second  order  problem. 


that  this 


method  is  extendable  in  its  present  form  to  a  general  n-th  order  problem. 

In  Chapter  4  we  applied  our ^iterative  method^to  a  scalar  economic 
example  involving  government  regulation  of  a  monopoly.  A  simulation  study 
of  the  problem  revealed  that  the  desired  regulation  was  indeed  achieved. 


We  also  demonstrated  the  effectiveness  of  method  on  a  general  second 

!  J  ;■  ■  :■  ' 

order  numerical  problem. 


Future  research  regarding  application  of  optimal  incentive  controls 
to  Stackelberg  games  with  unknown  cost  functionals  may  now  focus  on  two 
general  areas.  Starting  with  the  iterative  method  detailed  in  this  thesis, 
one  may  abandon  the  diagonal  incentive  matrix  structure  and  attempt  to  use 
the  resulting  degrees  of  freedom  to  satisfy  other  useful  criteria.  An 
example  of  this  is  given  by  the  minimum  sensitivity  design  approach 


mentioned  earlier.  It  is 


tte  desirable  to 


try 


extend  the 


existing  methods  to  dynamical  systems  and  to  problems  involving  more  than 
two  players.  ____ 
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